Word Co-occurrence Augmented Topic Model in Short Text

نویسندگان

  • Guan-Bin Chen
  • Hung-Yu Kao
چکیده

Topic models learn topics base on the amount of the word co-occurrence in the documents. The word co-occurrence is a degree which describes how often the two words appear together. BTM, discovers topics from bi-terms in the whole corpus to overcome the lack of local word co-occurrence information. However, BTM will make the common words be performed excessively because BTM identifies the word co-occurrence information by the bi-term The 2015 Conference on Computational Linguistics and Speech Processing ROCLING 2015, pp. 164-166  The Association for Computational Linguistics and Chinese Language Processing

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Topic Sentiment Joint Model with Word Embeddings

Topic sentiment joint model is an extended model which aims to deal with the problem of detecting sentiments and topics simultaneously from online reviews. Most of existing topic sentiment joint modeling algorithms infer resulting distributions from the co-occurrence of words. But when the training corpus is short and small, the resulting distributions might be not very satisfying. In this pape...

متن کامل

Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions

In this paper, we propose a novel topic model based on incorporating dictionary definitions. Traditional topic models treat words as surface strings without assuming predefined knowledge about word meaning. They infer topics only by observing surface word co-occurrence. However, the co-occurred words may not be semantically related in a manner that is relevant for topic coherence. Exploiting di...

متن کامل

Topic Modeling over Short Texts by Incorporating Word Embeddings

Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...

متن کامل

Intensity of Relationship Between Words: Using Word Triangles in Topic Discovery for Short Texts

Uncovering latent topics from given texts is an important task to help people understand excess heavy information. This has caused the hot study on topic model. However, the main texts available daily are short, thus traditional topic models may not perform well because of data sparsity. Popular models for short texts concentrate on word co-occurrence patterns in the corpus. However, they do no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2015